Scalable and Adaptive Online Joins
نویسندگان
چکیده
Scalable join processing in a parallel shared-nothing environment requires a partitioning policy that evenly distributes the processing load while minimizing the size of state maintained and number of messages communicated. Previous research proposes static partitioning schemes that require statistics beforehand. In an online or streaming environment in which no statistics about the workload are known, traditional static approaches perform poorly. This paper presents a novel parallel online dataflow join operator that supports arbitrary join predicates. The proposed operator continuously adjusts itself to the data dynamics through adaptive dataflow routing and state repartitioning. The operator is resilient to data skew, maintains high throughput rates, avoids blocking behavior during state repartitioning, takes an eventual consistency approach for maintaining its local state, and behaves strongly consistently as a black-box dataflow operator. We prove that the operator ensures a constant competitive ratio 3.75 in data distribution optimality and that the cost of processing an input tuple is amortized constant, taking into account adaptivity costs. Our evaluation demonstrates that our operator outperforms the state-of-the-art static partitioning schemes in resource utilization, throughput, and execution time.
منابع مشابه
Squall: Scalable Real-time Analytics using Efficient, Skew-resilient Join Operators
Squall is a scalable online query engine that runs complex analytics in a cluster using skewresilient, adaptive operators. Online processing implies that results are incrementally built as the input arrives, and it is ubiquitous for many applications such as algorithmic trading, clickstream analysis and business intelligence (e.g., in order to reach a potential customer during the active sessio...
متن کاملAdaptive Group Key Management Protocol for Wireless Communications
Group-oriented services and wireless communication networks are among the emerging technologies of the last few years. Group key management, which is an important building bloc in securing group communication, has received a particular attention in both academic and industry research communities. This is due to the economical relevance of group-based applications, such as video on demand, video...
متن کاملConvergent Inference with Leaky Joins
Over the past decade, a class of model database engines like BayesStore, MauveDB, and numerous others have emerged, allowing users to interact with probabilistic graphical models through queries. A key task for model databases, computing marginal probabilities grows exponentially in the complexity of the graph. Although exact solutions are feasible for smaller graphs, for larger graphs approxim...
متن کاملAdaptive Load Diffusion for Stream Joins
Data stream processing has become increasingly important as many emerging applications call for sophisticated realtime processing over data streams, such as stock trading surveillance, network traffic monitoring, and sensor data analysis. Stream joins are among the most important stream processing operations, which can be used to detect linkages and correlations between different data streams. ...
متن کاملOptimal adaptive leader-follower consensus of linear multi-agent systems: Known and unknown dynamics
In this paper, the optimal adaptive leader-follower consensus of linear continuous time multi-agent systems is considered. The error dynamics of each player depends on its neighbors’ information. Detailed analysis of online optimal leader-follower consensus under known and unknown dynamics is presented. The introduced reinforcement learning-based algorithms learn online the approximate solution...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PVLDB
دوره 7 شماره
صفحات -
تاریخ انتشار 2014